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Abstract 

The paper is concerned with stochastic approximation procedures having 
three main characteristics: truncations with random moving bounds, a matrix 
valued random step-size sequence, and a dynamically changing random regres¬ 
sion function. We study convergence and rate of convergence. Main results are 
supplemented with corollaries to establish various sets of sufficient conditions, 
with the main emphases on the parametric statistical estimation. The theory 
is illustrated by examples and special cases. 
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1 Introduction 

This paper is a continuation of Sharia (2014) where a large class of truncated Stochas¬ 
tic approximation (SA) procedures with moving random bounds was proposed. Al¬ 
though the proposed class of procedures can be applied to a wider range of problems, 
our main motivation comes from applications to parametric statistical estimation the¬ 
ory. To make this paper self contained, we introduce the main ideas below (a full 
list of references as well as some comparisons can be found in Sharia (2014)). 

The main idea can be easily explained in the case of the classical problem of 
finding a unique zero, say z°, of a real valued function R(z) : R —> M when only 
noisy measurements of R are available. To estimate z°, consider a sequence defined 
recursively as 

Zt = Z t -1 + 7 1 [R(Z t - 1 ) + £t] , t — 1,2 ,... (1-1) 


1 


where {et} is a sequence of zero-mean random variables and { 7 *} is a deterministic 
sequence of positive numbers. This is the classical Robbins-Monro SA procedure (see 
Robbins and Monro (1951)), which under certain conditions converges to the root z° 
of the equation R(z) = 0. (Comprehensive surveys of the SA technique can be found 
in Benveniste et al. (1990), Borkar (2008), Kushner and Yin (2003), Lai (2003), and 
Kushner (2010).) 

Statistical parameter estimation is one of the most important applications of the 
above procedure. Indeed, suppose that X \,..., X t are i.i.d. random variables and 
f(x,6 ) is the common probability density function (w.r.t. some cr-finite measure), 
where 6 G M m is an unknown parameter. Consider a recursive estimation procedure 
for 6 defined by 
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t -1 


+ -i(9t-i) 1 


1 ) 


t > 1 , 


( 1 . 2 ) 


where 0 O G M m is some starting value and i{9) is the one-step Fisher information 
matrix (/' is the row-vector of partial derivatives of / w.r.t. the components of 6). 
This estimator was introduced in Sakrison (1965) and studied by a number of authors 
(see e.g, Polyak and Tsypkin (1980), Campbell (1982), Ljung and Soderstrom (1987), 
Lazrieve and Toronjadze (1987), Englund et al (1989), Lazrieve et al (1997, 2008), 
Sharia (1997-2010)). In particular, it has been shown that under certain conditions, 
the recursive estimator 9 t is asymptotically equivalent to the maximum likelihood 
estimator, i.e., it is consistent and asymptotically efficient. One can analyse ( 1 1 . 2 j) by 
rewriting it in the form of stochastic approximation with 7 * = 1/t, 


R(z) = i{z)- l E e 


f T (X t ,z) \ 
f(X t ,z ) / 


and e t 


W - 1)” 1 


6 t -i) 

f{Xj t - 1 ) 


R0t -1 


) 


where 6 is an arbitrary but fixed value of the unknown parameter. Indeed, under 
certain standard assumptions, R{9) = 0 and {e*} is a martingale difference w.r.t. 
the filtration {Tt} generated by {X t }. So, (II.2p is a standard SA of type (II.ip . 

Suppose now that we have a stochastic process Xi,X 2 ,... and let ft(x,9 ) = 
ft(x,9\Xi,... ,X t _i) be the conditional probability density function of the observa¬ 
tion X t given X ±,..., X t _i , where 6 G R m is an unknown parameter. Then one can 
define a recursive estimator of 6 by 


9t — Qt —1 + 1 t > 1 , 


(1.3) 


where ipt(9) = ipt(X 1 ,..., X t \ 6 ), t = 1,2,, are suitably chosen functions which 
may, in general, depend on the vector of all past and present observations Ad, ...,X t , 
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and have the property that the process Yt(9) is P 6 ~ martingale difference, i.e., 
Eg {Yt(9) | F-i} = 0 for each t. For example, a choice 


MO) = k{0) 


[f't(X u 6)Y 

Mx t ,e) 


yields a likelihood type estimation procedure. In general, to obtain an estimator 
with asymptotically optimal properties, a state-dependent matrix-valued random 
step-size sequences are needed (see Sharia (2010)). For the above procedure, a step- 
size sequence 7 t (9) with the property 


ii\o) - it-M = E e {M0)if(0) | F-i} 


is an optimal choice. For example, to derive a recursive procedure which is asymp¬ 
totically equivalent to the maximum likelihood estimator, we need to take 

M0) = lt{0) and 7 t (e) = i-\e), 


where 

t 

«») = £ £{U0)it(0) (1-4) 

S=1 

is the conditional Fisher information matrix. To rewrite (11.31) in the SA form, let us 
assume that 6 is an arbitrary but fixed value of the parameter and define 


R t (z) = E e {M x t, z) I F- 1 } and s t (z) = ( M x t , z) - R t (z )). 

Then, since YtY) is P e -martingale difference, it follows that Rt(0) = 0 for each t. 
So, the objective now is to find a common root 9 of a dynamically changing sequence 
of functions R t . 

Before introducing the general SA process, let us consider one simple modification 
of the classical SA procedure. Suppose that we have additional information about 
the root z° of the equation R(z) = 0. Let us, e.g., assume that z° G [at, Pt] at each 
step t, where a t and /3 t are random variables such that —00 < a t < (3 t < 00 . Then 
one can consider a procedure, which at each step t produces points from the interval 
[a t ,/3 t \. For example, a truncated classical SA procedure in this case can be derived 
using the following recursion 

Zt = < h[a t)) 3t] ( Z t -i + 7 1 [R{Z t - 1 ) +£t]), t — 1,2,... 
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where $ is the truncation operator, that is, for any — oo < a < b < oo, 

{ a if z < a, 
z if a < z < 6 , 
b if z > b. 

Truncated procedures may be useful in a number of circumstances. For example, 
if the functions in the recursive equation are defined only for certain values of the 
parameter, then the procedure should produce points only from this set. Truncations 
may also be useful when certain standard assumptions, e.g., conditions on the growth 
rate of the relevant functions are not satisfied. Truncations may also help to make an 
efficient use of auxiliary information concerning the value of the unknown parameter. 
For example, we might have auxiliary information about the parameters, e.g. a 
set, possibly time dependent, that contains the value of the unknown parameter. 
Also, sometimes a consistent but not necessarily efficient auxiliary estimator 9 t is 
available having a rate dt . Then to obtain asymptotically efficient estimator, one can 
construct a procedure with shrinking bounds by truncating the recursive procedure 
in a neighbourhood of 9 with [a t , /3 t ] = [9 t — 5 t , 9 t + 5t], where S t —> 0 . 

Note that the idea of truncations is not new and goes back to Khasminskii and 
Nevelson (1972) and Fabian (1978) (see also Chen and Zhu (1986), Chen et al. (1987), 
Andradottir (1995), Sharia (1997), Tadic (1997,1998), Lelong (2008). A comprehen¬ 
sive bibliography and some comparisons can be found in Sharia (2014)). 

In order to study these procedures in an unified manner, Sharia (2014) introduced 
a SA of the following form 

Zt = &u t ^ Z t -1 + 'yt(Zt-i) \Rt{Zt~ i) + £t(Z t _ i)] ^, 7 = 1,2 ,... 

where Zq G R m is some starting value, Rt{z) is a predictable process with the property 
that Rt(z°) = 0 for all f s, 7 t (z) is a matrix-valued predictable step-size sequence, and 
U t C is a random sequence of truncation sets (see Section [21 for details). These 
SA procedures have the following main characteristics: ( 1 ) inhomogeneous random 
functions R t ; ( 2 ) state dependent matrix valued random step-sizes; (3) truncations 
with random and moving (shrinking or expanding) bounds. The main motivation 
for these comes from parametric statistical applications: ( 1 ) is needed for recursive 
parameter estimation procedures for non i.i.d. models; ( 2 ) is required to guarantee 
asymptotic optimality and efficiency of statistical estimation; (3) is needed for var¬ 
ious different adaptive truncations, in particular, for the ones arising by auxiliary 
estimators. 
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Convergence of the above class of procedures is studied in Sharia (2014). In this 
paper we present new results on rate of convergence. Furthermore, we present a 
convergence result which generalises the corresponding result in Sharia (2014) by 
considering time dependent random Lyapunov type functions (see Lemma 13.1[) . This 
generalisation turns out to be quite useful as it can be used to derive convergence 
results of the recursive parameter estimators in time series models. Some of the 
conditions in the main statements are difficult to interpret. Therefore, we discuss 
these conditions in explanatory remarks and corollaries. The corollaries are pre¬ 
sented in such a way that each subsequent statement imposes conditions that are 
more restrictive than the previous one. We discuss the case of the classical SA and 
demonstrate that conditions introduced in this paper are minimal in the sense that 
they do not impose any additional restrictions when applied to the classical case. 
We also compare our set of conditions to that of Kushner-Clark’s setting (see Re¬ 
mark [474]) • Furthermore, the paper contains new results even for the classical SA. In 
particular, truncations with moving bounds give a possibility to use SA in the cases 
when the standard conditions on the function R do not hold. Also, an interesting 
link between the rate of the step-size sequence and the rate of convergence of the 
SA process is given in the classical case (see corollary 14.71 and Remark 14.8p . This 
observation might not surprise experts working in this field, but we failed to find it 
in a written form in the existing literature. 

2 Main objects and notation 

Let (D, E,F = (Ft)t> o, P) be a stochastic basis satisfying the usual conditions. 
Suppose that for each t — 1, 2 ,..., we have x F )-measurable functions 

Rt(z) = R t (z, u) : M m x Q 
£ t (z) = e t (z, u) : M m x Q 
7 t (z) = 7 t{z, w) : M m x R mxm 

such that for each z G R m , the processes Rt(z) and 7 t (z) are predictable, i.e., Rt(z) 
and 7 t (z) are F t - 1 measurable for each t. Suppose also that for each z G R m , the pro¬ 
cess £t(z) is a martingale difference, i.e., £t(z) is F t measurable and E {£t(z) \ T t ~ 1 } = 
0. We also assume that 

Rt(z°) = 0 

for each t — 1 , 2 , , where z° G is a non-random vector. 

Suppose that h = h(z) is a real valued function of z G M m . Denote by h'(z) 
the row-vector of partial derivatives of h with respect to the components of z, that 
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is, h'(z) = y^h(z ),..., gf -h(z)J . Also, we denote by h"(z) the matrix of second 

partial derivatives. The m x m identity matrix is denoted by I. Denote by [a] + 
and [a]” the positive and negative parts of a 6 1 , i.e. [a] + = max(a, 0 ) and [a] - = 
min(a, 0 ). 

Let U C M m is a closed convex set and define a truncation operator as a function 
: M m —* M m , such that 


&u(z) 


z if zeu 
z* if z(£ U, 


where z* is a point in U, that minimizes the distance to z. 

Suppose that z° £ R m . We say that a random sequence of sets U t = U t (u>) 
(t — 1,2,...) from R m is admissible for z° if 

• for each t and oo, U t (u >) is a closed convex subset of M m ; 

• for each t and z £ M m , the truncation < f > c/ t (^) is T t measurable; 

• z° £ U t eventually, i.e., for almost all u there exist to(co) < oo such that z° £ U t (uj ) 
whenever t > t 0 (ui). 

Assume that Z 0 £ is some starting value and consider the procedure 

Zt — &u t {^t-i + j t — 1 , 2 ,... (2-1) 

where U t is admissible for z°, 


*t(z) = Rt(z ) +e t (z), 

and Rt(z), £t{z), 7 t (z) are random helds defined above. Everywhere in this work, we 
assume that 

E{y t {Z t _ 1 )\F t _ 1 } = R t {Z t _ 1 ) ( 2 . 2 ) 

and 

B{7(Z,_ 1 )£,(Z t _ 1 )| [E{eJ(z)e,(z) 1 ^,- 1 }],,^, (2.3) 

and the conditional expectations ( 12 . 2 j) and ( 12 .3p are assumed to be finite. 


Remark 2.1 Condition (12.21) ensures that £t{Zt- 1 ) is a martingale difference. Con¬ 
ditions (12.21) and (12.3j) obviously hold if, e.g., the measurement errors £t(u) are in¬ 
dependent random variables, or if they are state independent. I 11 general, since we 
assume that all conditional expectations are calculated as integrals w.r.t. corre¬ 
sponding regular conditional probability measures (see the convention below), these 
conditions can be checked using disintegration formula (see, e.g., Theorem 5.4 in 
Kallenberg (2002)). 
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We say that a random field 


V t (z) = V t (z,uj) : R m x —> M (t = 1,2,...) 


is a Lyapunov random field if 


• Vt(z) is a predictable process for each z £ M m ; 

• for each t and almost all oj, V t {z) is a non-negative function with continuous and 
bounded partial second derivatives. 

Convention. 

• Everywhere in the present work convergence and all relations between random vari¬ 
ables are meant with probability one w.r.t. the measure P unless specified otherwise. 

• A sequence of random variables (Ct)t>i has a property eventually if for every u 
in a set °f P probability 1, the realisation Ct(w) has this property for all t greater 
than some to(oj) < oo. 

• All conditional expectations are calculated as integrals w.r.t. corresponding regular 
conditional probability measures. 

• The iifibgjy h(z) of a real valued function h(z ) is 1 whenever U — 0. 

3 Convergence and rate of convergence 

We start this section with a convergence lemma, which uses a concept of a Lyapunov 
random field (see Section [2]). The proof of this lemma is very similar to that of 
presented in Sharia (2014). However, the dynamically changing Lyapunov functions 
make it possible to apply this result to derive the rate of convergence of the SA 
procedures. Also, this result turns out to be very useful to derive convergence of the 
recursive parameter estimations in time series models. 

Lemma 3.1 Suppose that Z t is a process defined by (12.lib Let Vt(u) be a Lyapunov 
random field. Denote A t = Z t — z°, A Vfiu) = V t (u) — V t -i{u), and assume that 


(VI) 


V t (A t ) < Ht(A t _! + 7t (Z t _ 1 )[i2 t (Z t _ 1 ) +£ t (Z t _ l )}^ 


eventually; 


(V2) 


OO 



t =1 
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where 


)C t (u) = A V t (u) + V((u)^ t {z° + u)R t (z° + u) + ‘r] t (z° + u) 


and 


Vt(v) = 


- sup E 

^ Z 


Rt(v ) + e t (v) 


T , 
It 


R t (v) + £ t (u) 



T/ien Vt(A t ) converges (P-a.s.) to a finite limit for any initial value Z 0 . 

Furthermore, if there exists a set A G J- with P(A) > 0 such that for each e G (0,1) 

(V3) 

oo 

inf [JCt(u)]~ = oo on A, (3.1) 

«<V*(«)<l/e 

^ 2°+uGC/t_i 

then Vt(A t ) —* 0 (P-a.s.) for any initial value Z 0 . 

Proof. The proof is similar to that of Theorem 2.2 and 2.4 in Sharia (2014). Rewrite 
(12.1|1 in the form 

A t = A t _! + 7 t {Z t _i)[Rt{Z t _i) + £ t {Z t _ r)]. 

By (VI), using the Taylor expansion, we have 

V t (A t ) < V t (At- l ) + V;(A t - 1 )'y t (Zt-i)[Rt{Zt-i) + et(Z^ 1 )] 

+ 2 \Rt(Zt-i) + £t(Zt-i)] T if (■^t-i)V/ / (A f _i)7 t (Z t _i)[i?i(Z t _i) + £ t (Z t _!)], 
where A t _i G M m is J~t-r measurable Since 

Vt(A t _i) = y t _i(A t _!) + AVt(A t _ 1 ), 
using (12.2p and (j2.3jl . we obtain 

£{y t (A t )|.F t -i} < V-i(A t _!) + /Ct(A t _0. 

Then, using the decomposition JC t = [/C t ] + — [/C f ] _ , the above can be rewritten as 

E{V t (A t )\P t -i} < V_i(A t _ 1 )( 1 + 5 t ) + - [/C t (A t _ x )]-, 

where = (1 + V t _i(At_i)) _1 [/C t (A t _i)]+. 












By (V 2), we have that < oo. Now we can use Lemma EH] in Appendix 

(with X t = V t (A= 6-1 = -Bt and Ct = [/C t (A t _i)] _ ) to deduce that the 
processes V t (A t ) and 

t 

Y, = 

S=1 

converge to some finite limits. Therefore, it follows that Vt(At) —> r > 0. 

To prove the second assertion, suppose that r > 0. Then there exist e > 0 such 
that e < V t (A t ) < 1/e eventually. By (13.ip . this would imply that for some t 0 , 

oo oo 

V'[/C S (A S _ 1 )]“ > V inf [/C fl (tt)] _ = oo 

* J J e<V s (u)<l/e 

s=t 0 s=t 0 z 0 +ue[ 7 s _ 1 

on the set A, which contradicts the existence of a finite limit of Y t . Hence, r = 0 and 

V t {A t ) —> 0. ■ 

Remark 3.2 The conditions of the above Lemma are difficult to interpret. There¬ 
fore, the rest of the section is devoted to formulate lemmas and corollaries (Lemmas 
13.51 and 13.91 Corollaries 13.71 13.121 and 13.13f) containing sufficient conditions for the 
convergence and the rate of convergence, and remarks (Remarks 13.3113.4113.8113.101 
13.111 and 13.14[) explaining some of the assumptions. These results are presented in 
such a way, that each subsequent statement imposes conditions that are more restric¬ 
tive than the previous one. For example, Corollary 13.131 and Remark 13.141 contain 
conditions which are most restrictive than all the previous ones, but are written in 
the simplest possible terms. 

Remark 3.3 A typical choice of V t (u) is V t (u) = u T C t u, where {C t } is a predictable 
positive semi-definite matrix process. If C t /a t goes to a finite matrix with a t —> oo, 
then subject to the conditions of Lemma 13.11 a t \\Z t — z °|| 2 will tend to a finite 
limit implying that Z t —> z°. This approach is adopted in Example 15.31 to derive 
convergence of the on-line Least Square estimator. 

Remark 3.4 Consider truncation sets U t = S(a t ,r t ), where S denotes a closed 
sphere in R m with the center at at G M m and the radius r t . Let z[ = &u t { z t) and 
suppose that z° G U t . Let V t (u ) = u T C t u where C t is a positive definite matrix and 
denote by A/ 10,31 and \™ m the largest and smallest eigenvalues of C t respectively. Then 

(z' t — z°) T C t (z' t — z°) < (z t — z°) T C t (z t — z°) ^i.e., (VI) holds with V t (u) = 

if X^ ax vl < A™ n r^, where v t = \\a t — z°||. (See Proposition 16.21 in Appendix for 
details.) In particular, if C t is a scalar matrix, condition (VI) automatically holds. 
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Lemma 3.5 Suppose that all the conditions of Lemma 1,9. 1\ hold and 
(L) for any M > 0, there exist some 5 = S(cu) > 0 such that 

inf V t {u) > 5 eventually. 


Then Z t —» z° ( P-a.s.) for any initial value Z 0 . 

Proof. From Lemma [3.11 we have V t (A t ) —» 0 (a.s.). Now, A t —)■ 0 follows from 
(L) by contradiction. Indeed, suppose that A t -/—> 0 on a set, say B of positive 
probability. Then, for any fixed uj from this set, there would exist a sequence 4 —> 
oo such that |||| > e for some e > 0, and (13. 5 j) would imply that V tk (A tk ) > 5 > 0 
for large k- s, which contradicts the P- a.s. convergence Vt(A t ) —» 0. ■ 

Remark 3.6 The following corollary contains simple sufficient conditions for con¬ 
vergence. The poof of this corollary does not require dynamically changing Lyapunov 
functions and can be obtained from a less general version of Lemma 13.11 presented 
in Sharia (2014). We decided to present this corollary for the sake of completeness, 
noting that the proof, as well as a number of different sets of sufficient conditions, 
can be found in Sharia (2014). 

Corollary 3.7 Suppose that Z t is a process defined by (12. ip . U t are admissible trun¬ 
cations for z° and 


(Dl) 

(D2) 


for large t’s 

(z - z°) T R t (z) < 0 if zeU t - 1; 
there exists a predictable process r t > 0 such that 


sup 

z&Ut-i 


E{\\R t (z)+e t (z)\\ 2 \ Pt-i} 
1+1 \z — z° II 2 


< r t 


eventually, and 

OO 

r t af 2 < oo, P-a.s. 

t =l 

Then \\Z t — £°|| converges (P-a.s.) to a finite limit. 
Furthermore, if 
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(D3) for each e G (0,1), there exists a predictable process u t > 0 such that 

inf -(z - z°) T R t (z) > v t 
e<ll*—*°ll<l/e 
zeUt-i 

eventually, where 

OO 

= oo, P-a.s. 

4=1 

Then Z t converges (P-a.s.) to z°. 

Proof. See Remark 13.61 above. 

Remark 3.8 The rest of this section is concerned with the derivation of sufficient 
conditions to establish rate of convergence. In most applications, checking condi¬ 
tions of Lemma 13.91 and Corollary 13.121 below is difficult without establishing the 
convergence of Z t first. Therefore, although formally not required, we can assume 
that Zt —> z° convergence has already been established (using the lemmas and 
corollaries above or otherwise). Under this assumption, conditions for the rate of 
convergence below can be regarded as local in z°, that is, they can be derived using 
certain continuity and differentiability assumptions of the corresponding functions 
at point z° (see examples in Section EJ) • 


Lemma 3.9 Suppose that Z t is a process defined by (12.ip . Let {C t } be a predictable 
positive definite m x m matrix process, and \™ ax and X] nm be the largest and the 
smallest eigenvalues of Ct respectively. Denote A t = Z t — z°. Suppose also that (VI) 
of Lemma \3.1\ holds and 

(Rl) there exists a predictable non-negative scalar process V t such that 


2Af_ 1 Ct , yt(z 0 + A t -i)Rt(z° + Aj_i) 


A 


max 

t 


+ V t < -ptHAi.xll 2 , 


eventually, where p t is a predictable non-negative scalar process satisfying 


£ 


\ max \ min 
A t A t -1 


\min 
A t -1 


ymax 

\ min Pt 
A 4-l 


< oo; 


(R2) 
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\max 
oo A t 


E 

t =1 


E 


71(^° + A*-i) Rt(z° + A t _i) + £t(z° + A t _i) 


•Ft-1 !> - v t 


1 +A™f||A t _ 1 || 2 


< OO. 


Then (Z t — z°) T C t (Z t — z°) converges to a finite limit (P-a.s.). 


Proof. Let us check the conditions of Lemma [3.11 with Vt(u) = u T CtU. Condition 
(VI) is satisfied automatically. 

Denote R t = R t (z° + A t _i), 7* = 7 t (z° + A t _i) and e t = e t (z° + A t _i). Since 
V t '(u) = 2 u T C t and Vfi(u) = 2 C t , we have 


JCt(At-i) — AV r t(A i _ 1 ) + 2Aj_ 1 C t r YtRt + -E 1 {[7t(.Rt + £ t)] T ^Wt(-R*: + £*) | J 7 1—1 } 

Since C t is positive definite, A™ m ||n|| 2 < u T C t u < AJ” aa! ||M|| 2 for any u E M m . There¬ 
fore 

AVt(A t _ 1 ) < (A T ax - A-T)||A t _ 1 || 2 . 

Denote 

v t = \r x (v t -v t ) 

where 

V t = E{\\ lt {R t + e t )\\ 2 \T t ^}. 

Then 


/C t (A t _!) < (A” 



A™T)||A t _ 1 || 2 + 2A l^CtftRt + \r x V t 
A™T)||A^H 2 + 2A T t _iCtltRt + A r x V t + V t ■ 


By (Rl), we have 


Therefore, 


2A T t _ x C tlt R t < -Xr x M^ t -if + V t ). 


/C t (A t _0 < (AT s -Arr)||A t _ 1 || 2 -Ar x (^||A t _i|| 2 + n) + AT a! n + A 

< (Ar x - Arr - Ar>)iiA t -if+n = r t ArriiA t -iii 2 +n, 


where 


„ ( \ max \ min \ max „ \ / \ min 

n — \\ - \ Pt )/\-1 
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Since A™ 1 " > 0, using the inequality [a + b] + < [a] + + [b] + , we have 

[/C 4 (A,_ 1 )] + < Ar_TI|A t _i|| 2 [rJ + + [P t ] + . 

Also, since V t -i(A t -i) = Af^Ct-iA^ > A^f||A t _i|| 2 , 

[M At-i)} + < [MAt-OT < A-T||A,„ 1 || 2 [r- t ]+ [p t ]+ 

1 + y t _i(A t _0 - 1 + A™T||A t _ 1 || 2 - 1 + A^fllAt-ill 2 1 + A^TlIAt-ill 2 


< N + + 


\n 


1 + A^"|| A t _! | 


By (R2), YltLi\Pt\ + /{ 1 + Ar_?||A t _i|| 2 ) < oo and according to (Rl) 


Ew + = E 


t =i 


Thus, 


t =i 


E 


yrnax 


\ min \ 

1 *■ 


max 
t 


\mm 

[MA,-i)]+ 
^i + I4-,(Ah) 


\min 


■pt 


< oo. 


< oo, 


implying that Condition (V2) of Lemma 13.11 holds. Thus, (Z t — z°) T C t (Z t — z°) 
converges to a finite limit almost surely. ■ 


Remark 3.10 The choice V t — 0 means that (R2) becomes more restrictive im¬ 
posing stronger probabilistic restrictions on the model. Now, if Aj^Ctf^z 0 + 
A t _ 1 )R t (z° + A t _x) is eventually negative with a large absolute value, then it is 
possible to introduce a non-zero V t without strengthening condition (Rl). One pos¬ 
sibility might be V t = || 7 t i?t|| 2 . In that case, since 7 t and R t are predictable processes, 
and sequence e t is a mart ingale-difference, 


E{\\ lt {R t + £ t )\\ 2 \E t ^} = \\ lt R t f + 


Then condition (R2) can be rewritten as 

OO 

E Ar“-E{||7«(z° + A(z° + A ( _,)|| 2 |J-«_ 1 } < oo. 

t= 1 

Remark 3.11 The next corollary is a special case of Lemma 13.91 when the step- 
size sequence is a sequence of scalar matrices, i.e. 7 t (Z t -i) = a^I, where a t is 
non-decreasing and positive. 
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Corollary 3.12 Let Z t be a process defined by (12.11) . Suppose that a t > 0 is a 
non-decreasing sequence and 

(Wl) 

eventually; 

(W2) there exist 0 < S < 1 such that, 

OO 

Y J ^- 2 E{\\{R t {Z t _ l ) + £ t {Z t _ l ))\\ 2 | Tt.,} < oo. 
t =i 


Then a 5 t \\Z t — z°|| 2 converges to a finite limit (P-a.s.). 

Proof. Consider Lemma 13.91 with = 'ffiz) = af 1 1, C t = af I. V t = 0 and p t = 
A a t /a t . To check (R2), denote the inhnite sum in (R2) by Q, then 


Q < E x 


max jfj 


t =1 

oo 


it 


Rt(z° + A t _x) + £t(z° + A t _0 


Vt -1 




< 


{wmzt.fi+ e t (Z t 


t-i 


Vt- 1}. 


t =1 


Now, since \™ m = = af and || 7 t || 2 = a t 2 , condition (W2) leads to (R2). 

Since p t = A a t /a t < 1 and (a t /a t _i ) 5 < a t /a t - 1 , 


E 


t=i L 


" \max \min 

A t A t-1 

ymax 
t Q 

+ 

oo 

- E 

t= i 

a t a t- i 

af 

\ min 

L A t- 1 

\ min P 1 

A t- 1 J 


a t-i 

5 P t 

<4-1 J 


E 


(i - «) 


t= 1 

OO 

s E 

t=i L 


a t-i 


/-^ A cij ^ Of ^ 


ctt Ctt —1 


= 0 . 


Therefore, (Wl) leads to (Rl). According to Remark 13.41 condition (VI) holds 
since V t {u) = af||w|| 2 . Thus, all the conditions of Lemma [3791 hold and af\\Z t — z°\\ 2 
converges to a finite limit (P-a.s.). ■ 
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Corollary 3.13 Let Z t be a process defined by (12.11) where z° £ M, y t (Z t _i) = l/t 
and the truncation sequence U t is admissible. Suppose that Z t —>• z° and 

(Yl) R'fiz 0 ) < —1/2 eventually; 

(Y2) R t (z) and cr 2 (z) = E(£ 2 (z)\Pt-i) are locally uniformly bounded at z° w.r.t. 
t; that is, there exists a constant K such that |i? t (^)| < K and |of(£ t )| < K 
eventually, for any —> z°. 

Then t s (Z t — z °) 2 converges to a finite limit (P-a.s.), for any 5 < 1. 

Proof. Consider Corollary 13.121 with a t = t. In the one-dimensional case, condition 
(Wl) can be rewritten as 

Rt(z° + A*-i) ^ 1 

- ~2 

Condition (Wl) now follows from (Yl). 

Since E{£ t (z)\iFt~i} = 0, using (Y2) we have for any S < 1, 

OO 

J2t S ~ 2 E {(RfiZt.fi + efiZt.fi) 2 | JWi} 

t= 1 

OO OO 

= ^f' 5 - 2 i?/(Z i _ 1 ) + Y J t S ~ 2 E {e 2 t (Zt-i) | Rt- 1} < OO. 

t =i t =i 

Thus, condition (W2) holds. Therefore, t s (Z t — z 0 ) 2 converges to a finite limit (P- 
a.s.), for any S < 1. ■ 


Remark 3.14 Corollary 13.131 gives simple but more restrictive sufficient conditions 
to derive the rate of convergence in one-dimensional cases. It is easy to see that 
all conditions of Corollary 13.131 trivially hold, if e.g., e t are state independent i.i.d. 
random variables with a finite second moment, Rfiz) = R(z ), and R\z°) < —1/2. 

4 Classical problem stochastic approximation 

Consider the classical problem of stochastic approximation to fold a root z° of the 
equation R(^°) = 0. Let us take a step-size sequence y t = af 1 !. where a t —> oo is 
a predictable scalar process, and consider the procedure 

Z t = ®u t (Z t -1 + af^RiZt.fi + e t (Z t _!)]). (4.1) 
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Corollary 4.1 Suppose that Z t is a process defined by (14.11) . truncation sequence U t 
is admissible, and 

(HI) 

(z - z°) t R(z) < 0 

for any z G M m with the property that z G U t eventually; 

(H2) there exists a predictable process r t such that 

OO 

sup ||i?(^)|| < r t where af 2 r t < oo; 


(H3) there exists a predictable process e t such that 


sup 

zeUt-i 


1 + 1 \z — z° II 2 


< et 


eventually, where 

OO 

^^e t af 2 < oo P-a.s.. 
t =i 

Then \\Z t — 2; 0 || converges to a finite limit (P-a.s.) for any initial value Z 0 . 


Furthermore, suppose that 

(Hf) R(z) is continuous at z° and (z — z°) T R(z ) < 0 for all z with the property 
that z G f/ t \{^ 0 } eventually; 


(H5) 


OO 



t= 1 


Then Z t —> z° (P-a.s.). 

Proof. Consider Corollary 13.71 with R t = R. Condition (Dl) trivially holds. Since 
E {e t (u) | T t - 1} = 0, we have 

B{||fl(z)+ £ (z)|| 2 | Jf,^} = ||fl(z)|| 2 + £;{||e t (z)|| 2 | 

Now condition (D2) holds with p t — r t + e t . 
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By (H4), there exists a constant v > 0 such that for each e G (0,1) 

inf —(z — z°) T R(u) > i/ 

e<||z-z°||<l/e 

zGUt—i 

eventually and by (H5) YH^=i ua t 1 = u Y^tL\ a t 1 = °°- This implies that (D3) also 
holds. Therefore, by Corollary 13.71 Z t —> z° almost surely. ■ 

Remark 4.2 Suppose that e* = £t(z) is an error term which does not depend on z 
and denote 

<% = e { ik.in 

Then condition (H3) holds if 

OO 

^ a i a i~ 2<00 ’ P_a - S ” ( 4 ‘2) 

t=l 

This shows that the requirement on the error terms are quite weak. In particular, 
the conditional variances do not have to be bounded w.r.t. t. 

Remark 4.3 (a) If the truncation sets are uniformly bounded, then some of the 
conditions above can be weakened considerably. For example, condition (H2) in 
Corollary 14.11 will automatically hold given that YltLi a t 2 < oo. 

(b) Also if it is only required that Z t converges to any finite limit, the step-size 
sequence a t can go to infinity at any rate as long as YltLi a t 2 < 00 • However, in 
order to have Z t —* z°, one must ensure that a t does not increase too fast. Also, 
the variances of the error terms can go to infinity as t tends to infinity, as long as 
the sum in (H3) is bounded. 

Remark 4.4 To compare the above result to that of Kushner-Clark’s setting, let us 
assume boundedness of Z t . Then there exists a compact set U such that Z t G U. 
Without lost of generality, we can assume that z° G U. Then Z t in Corollary 14.11 
can be assumed to be generated using the truncations on U t D U. Let us assume 
that Yl^Li a T 2 < Then, condition (H2) will hold if, e.g., R(z) is a continuous 
function. Also, in this case, given that the error terms £t(z) are continuous in z 
with some uniformity w.r.t. t, they will in fact behave in the same way as state 
independent error terms. Therefore, a condition of the type (14.2j) given in Remark 
14.21 will be sufficient for (H3). 

Corollary 4.5 Suppose that Z t , defined by (14.ip . converges to z° (P-a.s.) and trun¬ 
cation sequence U t is admissible. Suppose also that 
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(Bl) i 

u T R(z° + u) < —-\\u\\ 2 forsmallu’s; 
(B2) a t > 0 is non-decreasing with 


E 


A a t - 1 
V -1 


< oo; 


(B3) there exist 5 G (0,1) such that 

OO OO 

a 8 ~ 2 \\R(z° + u t )|| 2 < oo and ^a s t ~ 2 E{\\£ t (z° + < oo 

t =i t=i 

where v t G U t is any predictable process with the property v t — > 0. 

Then af\\Z t — £°|| 2 converges (P-a.s.) to a finite limit. 

Proof. Let us check that conditions of Lemma 13.91 hold with R t — R, p t — afi 
Vt = 0 and C t = a 5 t I. We have \™ ax = \™ m = af by (B2), and 


E 

t= 1 

OO 

- E 


\ max \ min \ max 

A t A t- 1 A t 

+ oo 

= E 

t= 1 

~a s t - a 8 t _ x a 5 t 

\ min \ min P l 

A t- 1 A t -1 J 

a t_\ a t-i a t 

/ \ <5 

"1" oo 

r 


^E 

t= i 

a t / 1 -i\ 1 

- (1 - a* ) - 1 

Vt- 1 


+ c 


- E 


Aa t - 1 
at -i 


+ C < oo 


for some constant C. So (Bl) leads to (Rl). Also since Z t —>• z°, 

^ \r x {E l\MA + z,)\\ 2 \ F<-i} - r] + 

fit i + A™niA f _,ii 2 

OO 

< ]T \E{H(Rt + £<)H 2 I ^- 1 } - p.] + 

t= 1 

OO 

= E“^{l|ar 1 W + e,)|| 2 |^ 1 } 

t= 1 

OO oo 

< T.-Twm^r+ ^ay 2 B{ii£,(z i _ 1 )ii 2 iJ- 1 _i}, 

t =i t= i 
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condition (R2) follows from (B3). Therefore by Lemma 13.91 (Z t — z°) T C t (Z t — z °) = 
a 5 t \\Z t — z°|| — » 0 (P-a.s.). ■ 

Remark 4.6 It follows from Proposition 16.31 in Appendix that if a t = t e with e > 1, 
then (B2) doesn’t hold. However, condition (B2) holds if at = t e for all e < 1. Indeed, 


E 


A a t - 1 

-(- OO 

_ V 

\( ‘ V , _ 

'_1 

a t -i 

t=i 



oo 

2 E 

t =i 

t 1 

1 

t- 1 / - 1 

+ 

= 0. 


Corollary 4.7 Suppose that Z t —* z°, where Z t is defined by (14.ip with a t = t e 
where e G (1/2,1], and (B1) in Corollary \4.5\ holds. Suppose also that R is continuous 
at z° and there exists 0 < 8 < 2 — 1/e such that 


(BB) 

oo 1 

E 7M7 B {lk<(2° + '"()H 2 |^<-i} < oo. 

t= 1 1 


where v t G U t is any predictable process with the property v t —> 0. 
Then t 5 \\Z t — z°\\ 2 converges to a finite limit (P-a.s.). 


Proof. Let us check conditions of Corollary 14.51 with a* = t e where e G (1/2,1]. 
Condition (B2) is satisfied (See Remark 14.61) . Since R is continuous at z° and Z t —> 
z°, it follows that R(z° + v t ) in (B3) is bounded. Also, a/~ 2 = fi s ~ 2 ^ e and since 
(5 — 2)e < —1, it follows that the first part of (B3) holds. The second part is a 
consequence of (BB). The result is now immediate from Corollary 14.51 ■ 

Remark 4.8 Suppose that a t = t £ with e G (1/2,1) and sup 4 P{||£j(^)|| 2 |J-)_i} < 
oo (e.g., assume that e t = efiz) are state independent and i.i.d.). Then, since 
(8— 2)e < —1, condition (BB) in Corollary 14.71 automatically holds for any 8 < 2 — 1/e. 
It therefore follows that the step-size sequence a t = t e , e G (1/2,1) produces SA 
procedures which converge with the rate t~ a where a < 1 — A. For example, the 
step-size a t = would produce the SA procedures, which converge with the rate 

r 1 / 3 . 
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5 Special models and examples 


5.1 Finding a root of a polynomial 

Let l be a positive integer and 

i 

i —1 

where z, z° G M and Ci are real constants. Suppose that 

(.z — z°)R(z) < 0 for all 


Note that if l > 1, the SA without truncations fails to satisfy the standard condition 
on the rate of growth at infinity. Therefore, one needs to use slowly expanding 
truncations to slow down the growth of R at infinity. Consider Z t defined by (14.11) 
with a truncation sequence U t = [— u t ,u t \, where u t —> oo is a sequence of non¬ 
decreasing positive numbers. Suppose that 

OO 

J2 u t l a i 2 < oo. (5.1) 

t= l 


Then, provided that the measurement errors satisfy condition (H3) of Corollary 14.11 
\Z t — z°\ converges (P-a.s.) to a f ini te limit. 

Indeed, condition (HI) of Corollary 14.11 trivially holds. For large i’s, 


sup ||i?(^)|| 2 < sup 

z€[-u t -i,ut-i] z&[-u t -\,ut-i\ 




i=l 


< sup 

z£[-u t _‘~ 1 




J2cK2u,-i) 2i < li‘CX-1, 


i=l 


which, by (15.11) . implies condition (H2) of Corollary 14.11 
Furthermore, if z° is a unique root, then provided that 


J2 a t 1 = o °’ ( 5 - 2 ) 

t= i 

it follows from Corollary 14.11 that Z t —^ z° (P-a.s.). One can always choose a 
suitable truncation sequence which satisfies (15.1|) and (15.21) . For example, if the 
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degree of the polynomial is known to be l (or at most /), and at = t , then one can 
take Ut = Ct r / 2 \ where C and r are some positive constants and r < 1 . One can also 
take a truncation sequence which is independent of /, e.g., u t = Clogt, where C is a 
positive constant. 

Suppose also that 

C\ > a t = t e where e G ( 0 , 1 ] 

and condition (BB) in Corollary 14.71 holds (e.g., one can assume for simplicity that 
£t s are state independent and i.i.d.). Then t a (Z t — z°) —> 0 for any a < 1 — l/2e. 

Indeed, since R'(z °) = —C\ < —1/2, condition (Bl) of Corollary 14.51 holds. Now, 
the above convergence is a consequence of Corollary 14.71 and Remark 14.81 

5.2 Linear procedures 

Consider the recursive procedure 


— Z t -\ + "YtiK — PtZt- i ) ( 5 - 3 ) 

where is a predictable positive definite matrix process, /3 t is a predictable pos¬ 
itive semi-definite matrix process and h t is an adapted vector process (i.e., h t is 
^-measurable for t > 1). If we assume that = /3 t z 0 , we can view (15.31) 

as a SA procedure designed to find the common root z° of the linear functions 

Rtiu) = E{h t - /3 t u\E t _i} = E{h t \F t _i} - fou = p t {z° - u) 

which is observed with the random noise 

e t (u) =h t - P t u - R t (u) =h t - E{h t \Et-i} = h t - P t z°. 

Corollary 5.1 Suppose that Z t is defined by (I5.3jl with E(h t \Ft-i) = Pt z °■ Suppose 
also that a t is a non-decreasing positive predictable process and 

(Gl) A 7 f -1 — 2 /3 t + fitltfit is negative semi-definite eventually; 

(G2) 

oo 

y ^a^E^ht - fi t z°) T ~jt(h t - fi t z 0 )\E t -i} < oo. 

t =i 

Then a^ t ~ 1 (Z t — z°) T 'yfi 1 (Z t — z Q ) converges to a finite limit (P-a.s.). 
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Proof. Let us show that conditions of Lemma 13.11 hold with Vt(u) = a~j~ 1 u T ^ 1 u. 
Condition (VI) trivially holds. We have V t '(u) = 2a~[ l u r ^ 1 , V"(u) = 2a)“ 1 7 t _1 , 
R t (z° + u) — —/ 3 t u and R t (u) + e t (u) = h t - /3 t u. Since E(h t - /3 t z°\E t -i) = 0, for r] t 
defined in (V2) we have 


rj t (z° + u) = a t 1 E^(h t - (3 t z° - ^tu) T ^ t ih t - (3 t z° - /3 t u) Ji_i| 


= a t 1 E <j (ht - /3 t z°) T 'y t (h t - (3 t z°) T t - x \ + a t 1 (/3 t u) T 'y t (/3 t u ) . 


Also, 


AV t (u) = u [(a t 7 t) ]w < u 1 (a t j t ) 1 u-u 1 (a t y t _i) u = u 1 a t 1 A'y t 1 u. 


Denoting 

Jt = a^E^Jit - /3 t z 0 ) T ^t(h t - f3 t z°) 

for K, t from (V2), we have 

A Z t (u) < a^rJ A^(~ l u — 2 a~[ l u T f3 t u + a~[ 1 u T + J t 

= a t _1 u T (A7“ 1 - 2 p t + /3t~/ t Pt)u + J t . 

Condition (V2) is now immediate from (Gl) and (G2) since 

[1 + Vt_i(A t _i)] 1 [/C t (A i _i)] + < [/Ct(Aj_i)] + < J t . 


Thus, all the conditions of Lemma [3.11 hold which implies the required result. ■ 

Corollary 5.2 Suppose that A^ 1 = f3 t , then (Gl) in Corollary 1 5. 1\ holds. 

Proof. Since Ay t _1 is positive semi-definite, it follows that A^ t is negative semi- 
definite ^see Horn and Johnson (1985) Corollary 7.7.4(a)^. Also since 

A-ii 1 - 2 A + PtltPt = -A^ 1 + Ay-^tAy" 1 = -Ay" 1 + y” 1 - 2y t T\ + y^ty^ 

= -7 i-i + li-Alt-i + Ay t )y t T\ = li\Altli\, 


it follows that (Gl) holds. 
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5.3 Parameter estimation in Autoregressive models 

Consider an AR(m) process 

X t = 0 (1) A_i + 0^X t - 2 + • • • + 9^X t _ m + & = 0 T X t ~ 1 m + & 

where 9 = ...,9 ( ' m ^) T , XjZm = (X t -i, ...,X t - m ) T and is a martingale-difference 

(i.e., E{£ t \X t -i} = 0). If the pdf of w.r.t. Lebesgue’s measure is gt(x), then the 
conditional probability density function of X t given the past observations is 

f t (x, eixt 1 ) = f t (x, 9\X£ n ) = g t (x - 9 T X$Z 1 m ) 

and 

/hMjO = 3',(x - e T xtzl) 

f t (0,x IX}- 1 ) g t (x-0 T X‘ t ll) 

It is easy to see that the conditional Fisher information (II.4j) is 

h = Y^ l 9 s XS s-m(X s s Z^) T where l gt = J 9 t(x)dx. 


The inverse I t 1 can also be generated recursively by 


K = K-\ - W-l^t-m(l + htW-J 1 Ir~l X t-m) (XlZiY i t --\. 


(5.4) 


(Note that this can be derived either directly, or using the matrix inversion formula, 
sometimes referred to as the Sherman-Morrison formula.) 

Thus, the on-line likelihood procedure in this case can be derived by the following 
recursion 

0, =C. - CCC-Ct - Ci x'ZJ (5-5) 

9t 

where If 1 is also derived on-line using formula (15.41) . In general, to include robust 
estimation procedures, and also to use any available auxiliary information, one can 
use the following class of procedures 


s, = *17, ( 0,-i + Ti- Cl CC)) , (5.6) 


where cp t : M t—)■ R and H : R m t—)■ R m are suitably chosen functions and is an 
m x m matrix valued step-size sequence. 
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Example 5.3 (Recursive least squares procedures) Recursive least squares (RLS) 
estimator of 6 = (9^\ ..., 9^) T is generated by the following procedure 


it = i-1 + IpXhllX, - (X« ) T 0,-,], (5.7) 

ip = ip i - + (.yi) T / t -_ 1 1 .y^]- 1 (.yy,) T c 1 i, (5.8) 

where 9q and a positive definite Iq 1 are some starting values. Note that (15.71) is a 
particular case of (15. 6f) . and it also coincides with the maximum likelihood procedure 
(15. 5 p in the case when are i.i.d. Gaussian r.v.’s. 


Corollary 5.4 Consider 9 1 defined by (15.7ft and (15.8ft . Suppose that there exists a 
non-decreasing sequence a t > 0 such that 


Y, a P( Xt ‘-P Ti PXiP m E{gP t -i} < °°. 

t= 1 

Then aZ 1 {9 t — 9) T I t {9 t — 9) converges to a finite limit (P e -a.s.). 

Proof. Let us check the condition of Corollary 15.11 Obviously, the matrix = 
if 1 = Iq 1 + El=i x )-m( X s-L) T is positive definite and A If 1 = fit = X t t zfi(Xfzfi) T 
is positive semi-definite. By Corollary 15.21 condition (Gl) holds. We also have 

oo oo 

E a P E {(h t - PP) T 7t(h, - Az 0 )|7 r ,_ 1 } = J2 a PE&( x Pl) T iP x PUt W-i) 

t= 1 t= 1 

OO 

= Y. a P < . X ‘tZ) T ipXtLE{g < oo. 

t= l 

So condition (G2) holds. Hence all conditions of Corollary 15.11 hold which completes 
the proof. ■ 

Corollary 5.5 Consider 9 1 defined by (15.71) and (15.8p . Suppose that 
(pi) there exists a non-decreasing sequence n t —> oo such that 

It/ Kt — t G 

where G < oo is a positive definite m x m matrix; 

(P2) there exists e° G [0,1) such that 

< nf eventually. 
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Then K]- 5 \\9 t -e\\ 2 


0 (P e -a.s.) for all 5 G (e°, 1]. 


Proof. Consider Corollary 15.41 with at = nf for a certain 5 G (e°, 1]. By (P2), there 
exists t° such that 


E 1 

t=t° 


-i 


t— 1 \T t —1 vt—1 


(XVJ 1 !, 


0(4’< E 

t=t° 


K 


:°-<5 


t —1 \T j —1 — 1 

t—m 




eventually. Now, using (PI) and Lemma 16.41 in Appendix , the above sum converges 
to a finite limit implying conditions of Corollary 15.41 hold. Therefore, ( 9 t — 9) T I t (9 t — 
9)/n 5 t tends to a finite limit. Now, the assertion of the corollary follows since I t /K t 
converges to a finite matrix. ■ 

Remark 5.6 (a) If X t is a strongly stationary process, condition (PI) will trivially 
hold with k t = t. However, using the results given above, convergence can be derived 
without the stationarity requirement as long as tends to a 

positive define matrix. 

(b) Condition (P2) demonstrates that the requirements on the innovations are 
quite week. In particular, the conditional variances of the innovations do not have 
to be bounded w.r.t. t. For example, if K t — t and the variances go to infinity not 
faster than t eo (for some 0 < £ 0 < 1), then it follows that t 1 ~ s \\9 t — 0 1 | 2 —>• 0 for any 

<5 £ (£o> !)• 

(c) It follows from (a) and (b) above that in the case of a strongly stationary X t with 
iid innovations, t 1 ~ 5 \\9 t — 9 1 | 2 — > 0 for any 5 > 0 without any additional assumptions. 


6 Appendix 

Lemma 6.1 Let T\, ... be an non-decreasing sequence of a-algebras and X n , 
fi n , in, Cn £ Pn, n > 0, be non-negative random valuables such that 


E^XnlX rj,_l) < X n _i(l + fi n - 1) + f n -\ — Cra-1, U > 1 

eventually. Then 


E&-i < 00 f n { E^- 1 < oo \ E {X l Ci-i < oo > P-a.s., 


i= 1 


i= 1 


i= 1 


where {A" —)■} denotes the set where lirn^oo X n exists and is finite. 
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Proof. The proof can be found in Robbins and Siegmund (1985). Note also that 
this lemma is a special case of the theorem on the convergence sets of non-negative 
semi-martingales (see, e.g., Lazrieva et al (1997)). ■ 

Proposition 6.2 Consider a closed sphere U = S(a,r ) in M' m with the center at 
a G M m and the radius r. Let z° G h and z ^ U. Denote by z' the closest point form 
z to U, that is, 

T 

z = a + t:- 77 (z — a). 

\\z — all 

Suppose also that C is a positive definite matrix such that 

\max 2 ^ \min 2 

a c v ^ a c r , 

where \™ ax and \™ in are the largest and smallest eigenvalues of C respectively and 
v = ||a — z°||. Then 

(z' - z°) T C(z' - z°) <(z- z°) T C(z - z°). 

Proof. For u,«6 M m , define 


u\\c = ( u T Cu) 1 / 2 and ( u,v)c = (■u T C , v) 1,/2 . 


We have 


|(z 0 - a, z' - a) c \ < \\z 0 - «|| c\\z' - «||c < v \\ z ' ~ <A\c 

< Jxf^rWz' - a\\c = z' ~ Oi\\\\z' - a|| c < \\z! - a\\ 2 c . (6.1) 

Since z ^ U, we have 

T 

z‘ — a + - rr(z — a) = (1 — 8 ) a + 8 z, 

|\z — cr || 

where 8 = r /\\z — a|| < 1. Then, since 

z — z' = (1 — 5 ) (2: — a), z — a — 8 {z — a), z — z' — —7— iz' — a), 

0 

by tfOD, 

{z' -z 0 ,z- z')c = (z 1 - a,z - z')c + (a - z 0 , z - z') c 

1 <5 11 / 11 9 1 8 / , \ „ 

= —7— | \Z - a c - — (20 - a,z - a)c > 0 . 

0 0 
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Therefore, since z' — zq = (z — zq) — (z — z'), we get 


Ik' - 2o ||c = Ik - lie + Ik - z '\\c ~ 2 ( z ~ z o,z- z') c 
z - -olio + Ik - z '\\C — 2 lk — ^'11 c~ 2 k' - z o, z - z ')c 
z - -olio - Ik - z '\\c ~ 2 (k ~ z o,z~ z')c < Ik - -oil c- 


Proposition 6.3 Suppose a t , t E N is a non-decreasing sequence of positive numbers 
such that 

1 

— < oo. 



Then 





Too. 


Proof. Since 







a t+i ~ a t 

a t 


S 


1 

a t 


and the last series converges, it is sufficient to show that 


OO 



Too. 


Note that for b > a > 0, we have 


b — a 
a 



In b — In a. 


So, 


N N 

E a S 

t=l * t=l 


lna t ) = lna 7 v+i — In ax —> Too as N —> oo. ■ 


Lemma 6.4 Suppose {a t } is a sequence of real m x 1 column vector, It — IT 
Yll=i a s a J diverges and is a sequence of positive numbers satisfying: 


h/ K t G, 
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where G is a finite positive definite m x m matrix. Then 


OO 


E 

t=N 


- 4 a fi t 1( *t < °° 


for any S > 0. 

Proof. Since tr(I t ) — rn + aja s is a non-decreasing sequence of positive 
numbers, we have (see Proposition A2 in Sharia (2007)) 


E 


T 

a t at 
[tr(I t )] 1+s 


< 


OO 


T 

a t 


t 


-r (EUl 




< CXD. 


Since fijn t converges, we have that tr(I t )/n t tends to a finite limit, and 


E 


T 

a t at 


Kt 


l+<5 


00 7 1 

E a t at 

t=i tr(I t y+ 5 



< 00 


Finally, since Gt is positive def ini te and we have K t I t 1 —> G x , and it follows that 
K t Xfi ax converges to a finite limit, where A™ ax is the largest eigenvalue of If 1 . Thus, 


E 


Kt 


-ajl t l a t < 


OO 

E 

t =1 


T 

a t at 


K 


l+<5 


• K t X 


max 

t 


< 00 . ■ 
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